-
Notifications
You must be signed in to change notification settings - Fork 62
docs: adding translation stats to docs #511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR is related to #493 -- second step declared in #493 (comment) |
The solution involves creating a Sphinx extension that reads in the JSON data generated, and creates a Plotly graph (with plotlyjs) and embeds it into the docs. Full statistics reports for each module are shown when hovering over the bar (as a function of the locale and module). This uses the hover tooltip to render properly. Hope you like it! Looking forward to feedback related to it @lwasser! |
@RobPasMue this is awesome!! Here is my one question, and then I'll provide a suggestion, but let's see what @flpm and @sneakers-the-rat think! I suspect that because translating.md is NOT actually a part of our pshinx guide, that people won't be able to see the graphic that you made in the page (i could be wrong but it looks like it needs to render plotly). So as a middle ground, could we do the following
y'all - let me know how that lands! |
)) | ||
|
||
# Create figure | ||
fig = go.Figure(data=traces) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RobPasMue could we create a grid of plots - one for each language?
then each plot could have 3 bars - one for fuzzy, one for complete and one for incomplete (or it could be stacked bars too.
What you have now is awesome but if we add more languages it will get complex over time. And a static version of the plot would be nice too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure sounds good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a good idea would be a heat map, which is condensed enough we can add many more languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed on the heatmap. i would expect it oriented with languages as rows and pages as columns (which satisfies the need to expandability to future languages)
|
||
# Create figure | ||
fig = go.Figure(data=traces) | ||
fig.update_layout( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the plot please use our pyOS colors?
Dark Purple: #33205c
Light Purple: #735fab
Pale Purple: #bab3d4
Magenta: #bb82b0
Sea Green: #81c0aa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most def! =) I'll look into the colors as soon as I can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh whoops. just saw this comment. probably would be good to use the css variables directly when we can to avoid having them hardcoded in multiple places. i couldn't find a rhyme or reason to when i was able to use css vars in the plotly values and when i needed to declare them in the stylesheet, but ya some examples in this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks so good - i just suggested a few changes. We could add translating, contributing etc to the guidebook as pages in another pr as well if we want to merge this. I just worry that the beautiful work done here won't render until we add these pages to the guide (i could be wrong!).
Hi @lwasser -- just to clarify, the translation page is available in our published docs: https://www.pyopensci.org/python-package-guide/TRANSLATING.html I think we should just link it properly to the landing page =) so that it is no longer an orphan as you mentioned.
Nonetheless, this is possible if y'all prefer. Although my feeling is that only people interested in the translation would be curious about this information. So, IMO, the best location would be the TRANSLATING.md file. But I'm up for discussion! =) |
Wow, that's pretty neat! 🤩 I like the idea of having an interactive visualization that people can consult and the Translation guide feels like a natural place. We should add a link to the live version on the site inside TRANSLATING.md, I imagine most users will look at the md in their own clones or in GitHub and they will not see the chart at first. The link will be a quick way to get them to the real data. On the visualization itself: I am not sure the bar chart is the best approach to show this data. In my head it feels more natural to imagine it as a heat map, where the rows are the files, the columns are the languages. In the heat map each cell would show the % complete and be colored with the proper intensity. I think the main advantage of a heat map is that empty cells are clearly defined and will be easier to spot than missing bars. I would also include English (hard coded at 100% in all cells), as a first column. And order the languages from most done to least done (currently, JA then ES). But I have never used plotly so I am not sure how much work that would be, we could always have that as a future improvement in a separate issue. |
|
||
```{translation-graph} | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe here would be a good spot to include the link to the site
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like we would want both directions? from the contributing page here and vice versa?
)) | ||
|
||
# Create figure | ||
fig = go.Figure(data=traces) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a good idea would be a heat map, which is condensed enough we can add many more languages.
My apologies, @RobPasMue you are correct, i saw that it was flagged orphan but now i realize it's just a matter of adding a link to it from the guide somewhere. Please ignore my comment. I'll defer to @flpm for what the final plots look like!! I do want the ability to add more languages in the future. And the ability for users to easily identify gaps and determine where they can contribute (without needing to run an A nox session). Let's add a link to the translating page in a separate PR so you don't have to worry about it here and can focus on the data viz challenge!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, ya nice, i needed a distraction today so i spent some time on the plot. haven't used plotly since it came out and whew somehow it became three different packages or something? anyway it's very slick. put my version of the plot in suggestion.
|
||
```{translation-graph} | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like we would want both directions? from the contributing page here and vice versa?
|
||
def run(self): | ||
# Read the JSON file containing translation statistics | ||
json_path = Path(__file__).parent.parent / "_static" / "translation_stats.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i missed the PR that added the translation_stats
script, but i think that this will go out of date almost immediately and become a misleading indicator if we don't generate this during the build process so that at the time the docs are generated the plot and the stats both reflect the same state of the repo.
I personally avoid committing generated data files that only need to exist at deployment time because they make PRs noisy and tempt us to treat them as files we can edit, but if we are to keep it there, we should add it and trigger it from some early build event like builder-inited
- see the _post_build
and setup
functions at the bottom of conf.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sneakers-the-rat! Yeah that makes sense - I could move the generation of the JSON file into a build_event. That sounds reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it took me many years and many attempts of trying before the hook pattern of sphinx sunk in for me, the way you did it was completely understandable
_ext/translation_graph.py
Outdated
|
||
def run(self): | ||
# Read the JSON file containing translation statistics | ||
json_path = Path(__file__).parent.parent / "_static" / "translation_stats.json" | ||
with json_path.open("r") as f: | ||
data = json.load(f) | ||
|
||
# Collect all module names -- iterates over the JSON data in 2 levels | ||
all_modules = {module for stats in data.values() for module in stats} | ||
all_modules = sorted(all_modules) | ||
|
||
# Build one trace per locale with full hover info | ||
traces = [] | ||
|
||
for locale, modules in data.items(): | ||
y_vals = [] | ||
hover_texts = [] | ||
|
||
for module in all_modules: | ||
stats = modules.get(module) | ||
y_vals.append(stats["percentage"]) | ||
|
||
hover_text = ( | ||
f"<b>{module}</b><br>" | ||
f"Translated: {stats['translated']}<br>" | ||
f"Fuzzy: {stats['fuzzy']}<br>" | ||
f"Untranslated: {stats['untranslated']}<br>" | ||
f"Total: {stats['total']}<br>" | ||
f"Completed: {stats['percentage']}%" | ||
) | ||
hover_texts.append(hover_text) | ||
|
||
traces.append(go.Bar( | ||
name=locale, | ||
x=all_modules, | ||
y=y_vals, | ||
hovertext=hover_texts, | ||
hoverinfo="text" | ||
)) | ||
|
||
# Create figure | ||
fig = go.Figure(data=traces) | ||
fig.update_layout( | ||
barmode="group", | ||
title="Translation Coverage by Module and Locale", | ||
xaxis_title="Module", | ||
yaxis_title="Percentage Translated", | ||
height=600, | ||
margin=dict(l=40, r=40, t=40, b=40) | ||
) | ||
|
||
div = plot(fig, output_type="div", include_plotlyjs=True) | ||
return [nodes.raw("", div, format="html")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def run(self): | |
# Read the JSON file containing translation statistics | |
json_path = Path(__file__).parent.parent / "_static" / "translation_stats.json" | |
with json_path.open("r") as f: | |
data = json.load(f) | |
# Collect all module names -- iterates over the JSON data in 2 levels | |
all_modules = {module for stats in data.values() for module in stats} | |
all_modules = sorted(all_modules) | |
# Build one trace per locale with full hover info | |
traces = [] | |
for locale, modules in data.items(): | |
y_vals = [] | |
hover_texts = [] | |
for module in all_modules: | |
stats = modules.get(module) | |
y_vals.append(stats["percentage"]) | |
hover_text = ( | |
f"<b>{module}</b><br>" | |
f"Translated: {stats['translated']}<br>" | |
f"Fuzzy: {stats['fuzzy']}<br>" | |
f"Untranslated: {stats['untranslated']}<br>" | |
f"Total: {stats['total']}<br>" | |
f"Completed: {stats['percentage']}%" | |
) | |
hover_texts.append(hover_text) | |
traces.append(go.Bar( | |
name=locale, | |
x=all_modules, | |
y=y_vals, | |
hovertext=hover_texts, | |
hoverinfo="text" | |
)) | |
# Create figure | |
fig = go.Figure(data=traces) | |
fig.update_layout( | |
barmode="group", | |
title="Translation Coverage by Module and Locale", | |
xaxis_title="Module", | |
yaxis_title="Percentage Translated", | |
height=600, | |
margin=dict(l=40, r=40, t=40, b=40) | |
) | |
div = plot(fig, output_type="div", include_plotlyjs=True) | |
return [nodes.raw("", div, format="html")] | |
# oddly, this is evaluated in the js not python, | |
# so we treat customdata like a json object | |
HOVER_TEMPLATE = """ | |
<b>%{customdata.module}</b><br> | |
Translated: %{customdata.translated}<br> | |
Fuzzy: %{customdata.fuzzy}<br> | |
Untranslated: %{customdata.untranslated}<br> | |
Total: %{customdata.total}<br> | |
Completed: %{customdata.percentage}% | |
""" | |
def run(self): | |
# Read the JSON file containing translation statistics | |
json_path = Path(__file__).parent.parent / "_static" / "translation_stats.json" | |
with json_path.open("r") as f: | |
data: TranslationStats = json.load(f) | |
# Sort data by locale and module | |
data = {locale: dict(sorted(loc_stats.items())) for locale, loc_stats in sorted(data.items())} | |
# prepend english, everything set to 100% | |
en = {module: ModuleStats(total=stats['total'], translated=stats['total'], fuzzy=stats['total'], untranslated=0, percentage=100) for module, stats in next(iter(data.values())).items()} | |
data = {'en': en} | data | |
# extract data to plot | |
locales = list(data.keys()) | |
modules = list(data[locales[-1]].keys()) | |
values = [[stats['percentage'] for stats in loc_stats.values()] for loc_stats in data.values()] | |
hoverdata = [[{'module': module} | stats for module, stats in loc_stats.items()] for loc_stats in data.values()] | |
heatmap = go.Heatmap( | |
x =modules, | |
y=locales, | |
z=values, | |
xgap=5, | |
ygap=5, | |
customdata=np.array(hoverdata), | |
hovertemplate=self.HOVER_TEMPLATE, | |
colorbar={ | |
'orientation': 'h', | |
'y': 0, | |
"yanchor": "bottom", | |
"yref": "container", | |
"title": "Completion %", | |
"thickness": 10, | |
}, | |
colorscale="Plotly3", | |
) | |
# Create figure | |
fig = go.Figure(data=heatmap) | |
fig.update_layout( | |
paper_bgcolor="rgba(0,0,0,0)", | |
plot_bgcolor="rgba(0,0,0,0)", | |
font_color="var(--bs-body-color)", | |
margin=dict(l=40, r=40, t=40, b=40), | |
xaxis_showgrid=False, | |
xaxis_side="top", | |
xaxis_tickangle=-45, | |
xaxis_tickfont = { | |
"family": "var(--bs-font-monospace)", | |
"color": "#fff" | |
}, | |
yaxis_showgrid=False, | |
yaxis_title="Locale", | |
yaxis_autorange="reversed", | |
) | |
div = plot(fig, output_type="div", include_plotlyjs=True) | |
return [nodes.raw("", div, format="html")] |
here ya go, here's heatmap. works in light and dark mode. i usd the blue/pink color scale because a) it's cute and b) fits in with the rest of the colors and c) it wasn't altogether obvious to me that yellow means completed :)
only thing i had to do that isn't here is add this to pyos.css
since you can use css vars in some places but not others for some reason.
.plotly svg {
text {
fill: var(--pst-color-text-base) !important;
}
}
other notes:
- flattened out nested iteration by separating data cleaning steps from plot creation steps
- use a
hovertemplate
andcustomdata
also to separate data cleaning from plotting logic - for some reason all plotting libraries want to make the default style very ugly with a bunch of gridlines, shaded backgrounds, and so on. so i removed all the unnecessary stuff so it blended into the page.
- make colorbar on bottom, horizontally, so it looks progressbarlike.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also probably rescale the colorbar so the difference between "100%" vs "not 100%" is clearer, or we can use a different one, idc. we don't need precise number readout for a plot like this, the purpose i think is just to communicate "which languages are mostly done vs not mostly done and which pages need to be worked on"
I guess the last thing to do would be accessibility. since we already have all the data when generating the plot, and we're generating an SVG, we might as well add |
from plotly.offline import plot | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from plotly.offline import plot | |
from plotly.offline import plot | |
import numpy as np | |
oops, forgot that i added this (though this is just trying to follow the docs where it says it must be an array, but i bet it would work fine just as a list of lists)
Co-authored-by: Jonny Saunders <[email protected]>
Thank you for the heatmap implementation @sneakers-the-rat! If y'all like it the way it's rendering (@flpm, @lwasser) I can add the suggestions to the code and then all solved. I will work on the auto-generation of the JSON info as part of #493 (comment) -- we can probably move away from having the persistent file and generate it on the fly. Although it would imply that everytime someone wants to build the docs, they have to go through that step... which might be unnecessary. If we have a workflow that updates it regularly, I think we solve our problem -- but I am up for discussion. Bottom line: I can have it implemented either way - either through a Sphinx hook at build time or as a GitHub actions workflow that updates it on a scheduled basis. |
to be clear i'm good with whatever does something in this neighborhood! just as long as we avoid forgetting something and eventually realizing the wonderful colorful square box has been broken for awhile. cosplaying as ornery reviewer who refuses all nice tooling and docs and rehearses the edge cases. Also sorry for making a code suggestion that was just like "here's a whole different thing," that is pretty rude of me. I was just having an afternoon where i needed a bit of time to distract myself and you had set up this great canvas in these two PRs, so thanks for that! you can feel free to take or leave any part of that.
rats, i did forget this. i like nice color sorting and it does make sense. it induces a tiny amount of gamification (the translation scoreboard!) which i think could be both cute and useful as a way of knowing where to target effort. there is a maybe remote chance that it makes someone feel bad or unwittingly participate in linguistic-cultural rivalries. So weight that as an "i'm aware of this being possible and think it's worth raising but have no estimate of either likelihood or magnitude" vs a higher probability of useful information and tidier color gradients. ~ yielding da floor ~ thanks for ur patience |
Oh no, never apologize for that! I really appreciate that you took the time to come up with a whole new implementation of the heatmap! I'm really glad it helped you distract yourself. Coding is the perfect way for it! =) It also helped me on understanding how to do it with a heatmap too! =)
I fully agree - I'm fine with either option too! As long as the data keeps getting updated haha. Also suffered from this in the past.. so I know the feeling.
Regarding this last point - I understand all points as well. I just went on default sorting (alphabetical) but sorting based on completeness makes sense as well to reach out for help on those languages that are not fully there yet. But I'll let @lwasser comment on this last point too =) and @sneakers-the-rat .. thanks for your thorough review. Once again, I really appreciate your efforts to read it through, provide code suggestions and help out on the implementation! =) |
I think it looks awesome! It will allow to put many languages in relatively small screen space. I think we should add the percentage in the cell. In addition to the "identify where the bigger gaps are" use case, there is also keeping the translation up to date as it ages, "spot where new gaps are appearing". When the English text evolves more and more entries will start to be marked fuzzy and the 100% will drift down to 99%, 98% etc. Without the numbers it will be hard to spot the small changes in the color. Reviewing fuzzy entries in the .PO file is a very easy task too, so this will help new contributors find those opportunities. |
Adding a dependency to this package with plotly for providing nice stats graphs.
See rendered docs for demonstration but here is a snapshot